Search CORE

3 research outputs found

ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

Author: Cevenini Claudia
Choukri Khalid
Kocour Martin
Kolčárek Pavel
Motlicek Petr
Nigmatulina Iuliia
Prasad Amrutha
Rigault Mickael
Sarfjoo Seyyed Saeed
Szöke Igor
Tart Allan
Veselý Karel
Zuluaga-Gomez Juan
Černocký Jan
Publication venue
Publication date: 08/11/2022
Field of study

Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at https://github.com/idiap/atco2-corpu

arXiv.org e-Print Archive

Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications

Author: Alexander Blatt
Allan Tart
Amrutha Prasad
Claudia Cevenini
Dietrich Klakow
Fabian Landis
Honza Černocký
Igor Szöke
Juan Zuluaga-Gomez
Karel Veselý
Khalid Choukri
Martin Kocour
Mickael Rigault
Pavel Kolčárek
Petr Motlicek
Saeed Sarfjoo
Publication venue: 'MDPI AG'
Publication date: 03/12/2020
Field of study

Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information such as call signs, commands, and values, which can be used to reduce ATCos’ workload and increase performance and safety in Air-Traffic Control (ATC)-related activities. Nevertheless, the collection of ATC speech data is very demanding, expensive, and limited to the intrinsic speakers’ characteristics. As a solution, this paper presents ATCO2, a project that aims to develop a unique platform to collect, organize, and pre-process ATC data collected from air space. Initially, the data are gathered directly through publicly accessible radio frequency channels with VHF receivers and LiveATC, which can be considered as an “unlimited-source” of low-quality data. The ATCO2 project explores employing context information such as radar and air surveillance data (collected with ADS-B and Mode S) from the OpenSky Network (OSN) to correlate call signs automatically extracted from voice communication with those available from ADS-B channels, to eventually increase the overall call sign detection rates. More specifically, the timestamp and location of the spoken command (issued by the ATCo by voice) are extracted, and a query is sent to the OSN server to retrieve the call sign tags in ICAO format for the airplanes corresponding to the given area. Then, a word sequence provided by an automatic speech recognition system is fed into a Natural Language Processing (NLP) based module together with the set of call signs available from the ADS-B channels. The NLP module extracts the call sign, command, and command arguments from the spoken utterance

Multidisciplinary Digital Publishing Institute

Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data

Author: Alexander Blatt
Allan Tart
Amrutha Prasad
Chloe Salamin
Claudia Cevenini
Dietrich Klakow
Fabian Landis
Hicham Atassi
Igor Szöke
Iuliia Nigmatulina
Jan Černocký
Juan Zuluaga-Gomez
Karel Veselý
Khalid Choukri
Martin Kocour
Mickael Rigault
Pavel Kolčárek
Petr Motlíček
Saeed Sarfjoo
Santosh Kesiraju
Publication venue: 'MDPI AG'
Publication date: 31/12/2021
Field of study

This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA

Multidisciplinary Digital Publishing Institute